This paper presents the OPUS ecosystem with a focus on the development of open machine translation models and tools, and their integration into end-user applications, development platforms and professional workflows. We discuss our on-going mission of increasing language coverage and translation quality, and also describe on-going work on the development of modular translation models and speed-optimized compact solutions for real-time translation on regular desktops and small devices.
translated by 谷歌翻译
Vulnerability to adversarial attacks is a well-known weakness of Deep Neural Networks. While most of the studies focus on natural images with standardized benchmarks like ImageNet and CIFAR, little research has considered real world applications, in particular in the medical domain. Our research shows that, contrary to previous claims, robustness of chest x-ray classification is much harder to evaluate and leads to very different assessments based on the dataset, the architecture and robustness metric. We argue that previous studies did not take into account the peculiarity of medical diagnosis, like the co-occurrence of diseases, the disagreement of labellers (domain experts), the threat model of the attacks and the risk implications for each successful attack. In this paper, we discuss the methodological foundations, review the pitfalls and best practices, and suggest new methodological considerations for evaluating the robustness of chest xray classification models. Our evaluation on 3 datasets, 7 models, and 18 diseases is the largest evaluation of robustness of chest x-ray classification models.
translated by 谷歌翻译
We propose a clustering procedure to group K populations into subgroups with the same dependence structure. The method is adapted to paired population and can be used with panel data. It relies on the differences between orthogonal projection coefficients of the K density copulas estimated from the K populations. Each cluster is then constituted by populations having significantly similar dependence structures. A recent test statistic from Ngounou-Bakam and Pommeret (2022) is used to construct automatically such clusters. The procedure is data driven and depends on the asymptotic level of the test. We illustrate our clustering algorithm via numerical studies and through two real datasets: a panel of financial datasets and insurance dataset of losses and allocated loss adjustment expense.
translated by 谷歌翻译
Causal discovery, the inference of causal relations from data, is a core task of fundamental importance in all scientific domains, and several new machine learning methods for addressing the causal discovery problem have been proposed recently. However, existing machine learning methods for causal discovery typically require that the data used for inference is pooled and available in a centralized location. In many domains of high practical importance, such as in healthcare, data is only available at local data-generating entities (e.g. hospitals in the healthcare context), and cannot be shared across entities due to, among others, privacy and regulatory reasons. In this work, we address the problem of inferring causal structure - in the form of a directed acyclic graph (DAG) - from a distributed data set that contains both observational and interventional data in a privacy-preserving manner by exchanging updates instead of samples. To this end, we introduce a new federated framework, FED-CD, that enables the discovery of global causal structures both when the set of intervened covariates is the same across decentralized entities, and when the set of intervened covariates are potentially disjoint. We perform a comprehensive experimental evaluation on synthetic data that demonstrates that FED-CD enables effective aggregation of decentralized data for causal discovery without direct sample sharing, even when the contributing distributed data sets cover disjoint sets of interventions. Effective methods for causal discovery in distributed data sets could significantly advance scientific discovery and knowledge sharing in important settings, for instance, healthcare, in which sharing of data across local sites is difficult or prohibited.
translated by 谷歌翻译
To apply federated learning to drug discovery we developed a novel platform in the context of European Innovative Medicines Initiative (IMI) project MELLODDY (grant n{\deg}831472), which was comprised of 10 pharmaceutical companies, academic research labs, large industrial companies and startups. The MELLODDY platform was the first industry-scale platform to enable the creation of a global federated model for drug discovery without sharing the confidential data sets of the individual partners. The federated model was trained on the platform by aggregating the gradients of all contributing partners in a cryptographic, secure way following each training iteration. The platform was deployed on an Amazon Web Services (AWS) multi-account architecture running Kubernetes clusters in private subnets. Organisationally, the roles of the different partners were codified as different rights and permissions on the platform and administrated in a decentralized way. The MELLODDY platform generated new scientific discoveries which are described in a companion paper.
translated by 谷歌翻译
本文开发了一种协作人类机器人探索的方法,该方法利用了隐式协调。大多数自动的单机器人和多机器人勘探系统都要求远程操作员为机器人团队提供明确的指导。很少有人考虑如何将人类合作伙伴与机器人一起嵌入到该领域的指导。对人类机器人探索的剩下的挑战是从人类到机器人的目标有效沟通。在本文中,我们开发了一种方法论,该方法从人的头上的头盔深度相机到机器人的头盔深度摄像头,以及一个基于信息增益的探索目标,并在人类提供的观点中偏向运动计划。结果是一个安全访问感兴趣区域的空中系统,该区域可能无法立即被人类查看或无法触及。该方法在模拟和运动捕获场中的硬件实验中进行了评估。仿真和硬件实验的视频可在以下网址提供:https://youtu.be/7jgkbpvfioe。
translated by 谷歌翻译
本文通过开发一种层次碰撞避免方法来改善基于安全的多旋转器的近电视,该方法根据环境复杂性和感知约束来调节最大速度。在表现出不同混乱的环境中,安全速度调制具有挑战性。现有方法固定了最大速度和地图分辨率,该方法可防止车辆进入狭窄的空间,并将认知负荷置于操作员上的速度。我们通过提出一种高速公路(10 Hz)的远程操作方法来解决这些差距,该方法通过分层碰撞检查调节最大车辆速度。分层碰撞检查器同时适应当地地图的体素尺寸和最大车辆速度,以确保运动计划安全。在模拟和现实世界实验中评估了所提出的方法,并将其与基于非自适应运动原语的远程操作方法进行了比较。结果证明了所提出的详细方法方法的优势以及完成任务的能力,而无需用户指定最大车辆速度。
translated by 谷歌翻译
随着Covid-19在世界范围内的传播,需要快速,精确的自动分诊机制,以减少人类的努力,例如用于基于图像的诊断。尽管文献在这个方向上显示出了有希望的努力,但报告的结果并未考虑在不同情况下获得的CT扫描的可变性,因此,渲染模型不适合使用,例如使用例如使用例如不同的扫描仪技术。虽然现在可以使用PCR测试有效地进行COVID-19诊断,但该用例却例证了一种方法来克服数据可变性问题以使医疗图像分析模型更广泛地适用。在本文中,我们使用COVID-19诊断的示例明确解决了可变性问题,并提出了一种新颖的生成方法,旨在消除例如成像技术同时通过利用深度自动编码器的想法来同时引入CT扫描的最小变化。拟议的预性架构(PrepNet)(i)在多个CT扫描数据集上共同训练,(ii)能够提取改进的判别特征以改善诊断。三个公共数据集(SARS-COVID-2,UCSD COVID-CT,MOSMED)的实验结果表明,我们的模型将交叉数据集的概括提高了高达$ 11.84 $ $的百分比,尽管数据集绩效中的情况略有下降。
translated by 谷歌翻译
实用的图像分割任务涉及必须从嘈杂,扭曲和/或不完整的观察值重建的图像。解决此类任务的最新方法是使用分段共同执行此次重建,使用每个分段来指导彼此。但是,迄今为止,这项工作采用了相对简单的分割方法,例如Chan - VESE算法。在本文中,我们提出了一种使用基于图的分割方法进行联合重建分割的方法,该方法一直在看到最近的兴趣增加。由于涉及的矩阵尺寸较大而引起并发症,我们展示了如何管理这些并发症。然后,我们分析我们方案的收敛属性。最后,我们将此方案应用于``两个母牛''图像的扭曲版本,该版本是先前基于图的分割文献中熟悉的``两个奶牛''图像,首先是高度噪声的版本,其次是模糊的版本,在两种情况下都可以实现高度准确的细分。我们将这些结果与通过顺序重建分割方法获得的结果进行比较,发现我们的方法与重建和分割精度相比,甚至均超过了这些方法。
translated by 谷歌翻译
我们提出了从大几何附近(LGV)的转移性,这是一种新技术,以提高黑盒对抗攻击的可传递性。LGV从预处理的替代模型开始,并从恒定且高学习率的其他一些训练时期收集了多个重量集。LGV利用了我们与可传递性相关的两个几何特性。首先,属于最佳体重的模型是更好的替代物。其次,我们确定一个能够在此更大最佳中生成有效的替代合奏的子空间。通过广泛的实验,我们表明单独使用LGV优于四个既定测试时间转换的所有(组合)。我们的发现为解释对抗性例子的转移性的几何形状的重要性提供了新的启示。
translated by 谷歌翻译